Range-Efficient Counting of Distinct Elements in a Massive Data

نویسندگان

  • A. Pavan
  • Srikanta Tirthapura
چکیده

Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider range-efficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer, but an interval of integers. We present a randomized algorithm which yields an ( , δ)-approximation of F0, with the following time and space complexities (n is the size of the universe of the items): (1)The amortized processing time per interval is O(log 1 δ log n ). (2)The workspace used is O( 1 2 log 1 δ log n) bits. Our algorithm improves upon a previous algorithm due to Bar-Yossef, Kumar and Sivakumar [BYKS02], which requires O( 1 5 log 1 δ log n) processing time per item. This algorithm can also be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the previous best time and space bounds due to Cormode and Muthukrishnan [CM03]. This algorithm also provides an efficient solution to the distinct summation problem, which arises during data aggregation in sensor networks [NGSA04, CLKB04].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Counting distinct objects over sliding windows

Aggregation against distinct objects has been involved in many real applications with the presence of duplicates, including real-time monitoring moving objects. In this paper, we investigate the problem of counting distinct objects over sliding windows with arbitrary lengths. We present novel, time and space efficient, one scan algorithms to continuously maintain a sketch so that the counting c...

متن کامل

Range-Efficient Counting of Distinct Elements in a Massive Data Stream

Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider rangeefficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer but an interval of integers. We present a randomized algor...

متن کامل

COUNTING DISTINCT FUZZY SUBGROUPS OF SOME RANK-3 ABELIAN GROUPS

In this paper we classify fuzzy subgroups of a rank-3 abelian group $G = mathbb{Z}_{p^n} + mathbb{Z}_p + mathbb{Z}_p$ for any fixed prime $p$ and any positive integer $n$, using a natural equivalence relation given in cite{mur:01}. We present and prove explicit polynomial formulae for the number of (i) subgroups, (ii) maximal chains of subgroups, (iii) distinct fuzzy subgroups, (iv) non-isomorp...

متن کامل

Adaptive Bloom Filter: A Space-Efficient Counting Algorithm for Unpredictable Network Traffic

The Bloom Filter (BF), a space-and-time-efficient hashcoding method, is used as one of the fundamental modules in several network processing algorithms and applications such as route lookups, cache hits, packet classification, per-flow state management or network monitoring. BF is a simple space-efficient randomized data structure used to represent a data set in order to support membership quer...

متن کامل

Range Counting with Distinct Constraints

In this paper we consider a special case of orthogonal point counting queries, called queries with distinct constraints. A d-dimensional orthogonal query range Q = [b1, b2]× [b3, b4]× . . .× [b2d−1, b2d] is a range with r distinct constraints if there are r distinct values among b1, b2, . . ., b2d. We describe a data structure that supports orthogonal range counting queries with r distinct cons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005